Search CORE

74 research outputs found

Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction

Author: Iqbal Sumaiya
Publication venue: ScholarWorks@UNO
Publication date: 09/08/2017
Field of study

Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools

University of New Orleans

DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel

Author: Hoque Md Tamjidul
Iqbal Sumaiya
Publication venue: ScholarWorks@UNO
Publication date: 01/01/2015
Field of study

Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0

Directory of Open Access Journals

PubMed Central

University of New Orleans

Rare De Novo Missense Variants in RNA Helicase DDX6 Cause Intellectual Disability and Dysmorphic Features and Lead to P-Body Defects and RNA Dysregulation

Author: Balak Chris
Benard M.
Ernoult-Lange Michele
Iqbal Sumaiya
Pfundt R.P.
Piton A.
Ramsey K.
Rump Patrick
Schaefer E.
Weil D.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Item does not contain fulltex

Radboud Repository

Characterization of intrinsically disordered regions in proteins informed by human genetic diversity

Author: Ahmed Shehab S
Campbell Arthur J
Dunker A Keith
Iqbal Sumaiya
Lohia Ruchi
Rahman M Sohel
Rifat Zaara T
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/03/2022
Field of study

All proteomes contain both proteins and polypeptide segments that don't form a defined three-dimensional structure yet are biologically active-called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase ("UniProt features": active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms

Cold Spring Harbor Laboratory Institutional Repository

IUPUIScholarWorks

PubMed Central

MISCAST : MIssense variant to protein StruCture Analysis web SuiTe

Author: Ahmed Shehab S.
Campbell Arthur J.
Cottrell Jeffrey R.
Daly Mark J.
Heyne Henrike O.
Hoksza David
Iqbal Sumaiya
Jespersen Jakob B.
Lal Dennis
May Patrick
Perez-Palma Eduardo
Rahman M. Sohel
Rifat Zaara T.
Wagner Florence F.
Publication venue
Publication date: 01/01/2020
Field of study

Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.Peer reviewe

Crossref

Kölner UniversitätsPublikationsServer

Helsingin yliopiston digitaalinen arkisto

Open Repository and Bibliography - Luxembourg

Online Research Database In Technology

Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants

Author: Ahmed Shehab S.
Campbell Arthur J.
Cottrell Jeffrey R.
Daly Mark J.
Heyne Henrike O.
Hoksza David
Iqbal Sumaiya
Jespersen Jakob B.
Lage Kasper
Lal Dennis
May Patrick
Palotie Aarno
Perez-Palma Eduardo
Rahman M. Sohel
Rifat Zaara T.
Wagner Florence F.
Publication venue
Publication date: 01/01/2020
Field of study

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.Peer reviewe

Kölner UniversitätsPublikationsServer

Helsingin yliopiston digitaalinen arkisto

Open Repository and Bibliography - Luxembourg

Insights into protein structural, physicochemical, and functional consequences of missense variants in 1,330 disease-associated human genes 693259

Author: Ahmed Shehab S.
Campbell Arthur C.
Cottrell Jeffrey R.
Daly Mark J.
Heyne Henrike O.
Hoksza David
Iqbal Sumaiya
Jespersen Jakob B.
Lage Kasper
Lal Dennis
May Patrick
Palotie Aarno
Perez-Palma Eduardo
Rahman M. Sohel
Rifat Zaara T.
Wagner Florence F.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 04/07/2019
Field of study

Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of identified missense variants is thus limited. Here we describe the aggregation and analysis of large-scale genomic variation and structural biology data for 1,330 disease-associated genes. Comparing the burden of 40 structural, physicochemical, and functional protein features of altered amino acids with 3-dimensional coordinates, we found 18 and 14 features that are associated with pathogenic and population missense variants, respectively. Separate analyses of variants from 24 protein functional classes revealed novel function-dependent vulnerable features. We then devised a quantitative spectrum, identifying variants with higher pathogenic variant-associated features. Finally, we developed a web resource (MISCAST; http://miscast.broadinstitute.org/) for interactive analysis of variants on linear and tertiary protein structures. The biological impact of missense variants available through the webtool will assist researchers in hypothesizing variant pathogenicity and disease trajectories

Open Repository and Bibliography - Luxembourg

Critical assessment of protein intrinsic disorder prediction

Author: Aykac-Fas Burcu
Bassot Claudio
Benítez Guillermo Ignacio
Bevilacqua Martina
Bitard-Feildel Tristan
Caid Predictors
Callebaut Isabelle
Chasapi Anastasia
Chemes Lucia Beatriz
Cheng Jianlin
Cozzetto Domenico
Davey Norman
Davidović Radoslav
Disprot Curators
Dosztányi Zsuzsanna
Dunker A. Keith
Elofsson Arne
Erdős Gábor
Galzitskaya Oxana Valerianovna
Gao Jianzhao
González-Foutel Nicolás S.
Govindarajan Sudha
Gsponer Jörg
Guharoy Mainak
Hajdu-Soltész Borbála
Hanson Jack
Hatos András
Hoque Md Tamjidul
Horvath Tamas
Hu Gang
Iglesias Valentin
Iqbal Sumaiya
Jones David T.
Kajava Andrey V.
Kovacs Orsolya Panna
Kurgan Lukasz
Lamb John
Lambrughi Matteo
Lazar Tamas
Leclercq Jeremy Y.
Leonardi Emanuela
Litfin Thomas
Lobanov Michail Yu
Macedo-Ribeiro Sandra
Macossay-Castillo Mauricio
Maiani Emiliano
Malhis Nawar
Manso Jose Antonio
Marino-Buslje Cristina
Martínez-Pérez Elizabeth
Meng Fanchi
Minervini Giovanni
Mirabello Claudio
Mičetić Ivan
Monzon Alexander Miguel
Murvai Nikoletta
Mészáros Bálint
Necci Marco
Orlando Gabriele
Ouzounis Christos
Pajkos Mátyás
Paladin Lisanna
Paliwal Kuldip
Palopoli Nicolás
Pancsa Rita
Papaleo Elena
Parisi Gustavo
Peng Zhenling
Pereira Pedro José Barbosa
Piovesan Damiano
Promponas Vasilis J.
Pujols Jordi
Quaglia Federica
Raimondi Daniele
Salvatore Marco
Schad Eva
Sharma Alok
Sharma Ronesh
Sormanni Pietro
Szabo Beata
Szaniszló Tamás
Tamana Stella
Tantos Agnes
Tompa Peter
Tosatto Silvio C. E.
Veljkovic Nevena
Vendruscolo Michele
Ventura Salvador
Vranken Wim
Wallner Björn
Walsh Ian
Wang Chen
Wang Kui
Wang Sheng
Wu Tianqi
Wu Zhonghua
Xu Jinbo
Yan Jing
Zhou Yaoqi
Álvarez Lucía
Publication venue: Nature Methods
Publication date: 01/01/2021
Field of study

Abstract: Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

CONICET Digital

HAL-IRD

Diposit Digital de Documents de la UAB

Apollo (Cambridge)

Trouble Walking: A Pain in the Neck Diagnosis

Author: Iqbal Sumaiya
Publication venue: RocScholar
Publication date: 21/09/2021
Field of study

Trouble Walking: A Pain in the Neck Diagnosis, Sumaiya Iqbal, M.D., Unity Faculty Partners. Outline: Case 1 Room for improvement Case 2 Review diagnosis Treatment Take home point

RocScholar (Rochester Regional Health)